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Abstract 

This paper presents a new lower bound for the discrete 
strategy improvement algorithm for solving parity games 
due to Voge and Jurdzinski. First, we informally show 
which structures are difficult to solve for the algorithm. Sec- 
ond, we outline a family of games of quadratic size on which 
the algorithm requires exponentially many strategy itera- 
tions, answering in the negative the long-standing question 
whether this algorithm runs in polynomial time. Addition- 
ally we note that the same family of games can be used to 
prove a similar result w.r.t. the strategy improvement variant 
by Schewe. 

1. Introduction 

Parity games are simple two-player games of perfect in- 
formation played on directed graphs whose nodes are la- 
beled with natural numbers, called priorities. A play in a 
parity game is an infinite sequence of nodes whose winner 
is determined by all priorities occurring infinitely often. In 
fact, it depends on the parity of the highest priority that oc- 
curs infinitely often, giving parity games their name. 

Parity games occur in several fields of theoretical com- 
puter science, e.g. as solution to the problem of comple- 
mentation or determinisation of tree automata JU Q] or as 
algorithmic backend to the model checking problem of the 
modal /i-calculus ll2l [T4ll . 

There are many algorithms that solve parity games, such 
as the recursive decomposing algorithm due to Zielonka 
ifTTIl and its recent improvement by Jurdzinski, Paterson and 
Zwick [ 8 ], the small progress measures algorithm due to Ju- 
rdzinski |7| with its recent improvement by Schewe ifTTI . 
the model-checking algorithm due to Stevens and Stirling 
lfT3ll and finally the two strategy improvement algorithms 
by Voge and Jurdzinski [ 16] and Schewe fl2l . 



All mentioned algorithms except for the two strategy im- 
provement algorithms have been shown to have a super- 
polynomial worst-case runtime complexity at best or there 
is at least little doubt that their worst-case runtime complex- 
ity is super-polynomial or even exponential. 

Solving parity games is one of the few problems that be- 
longs to the complexity class NP n coNP and that is not (yet) 
known to belong to P [2|. It has also been shown that solv- 
ing parity games belongs to UP n coUP [6|. The currently 
best known upper bound on the deterministic solution of 
parity games is 0(\E\ ■ |T/|3l rann l) due to Schewe's big- 
step algorithm lITTl . 

In this paper, we present a family of parity games com- 
prising a linear number of nodes and a quadratic number 
of edges such that the strategy improvement algorithm by 
Voge and Jurdzinski requires an exponential number of it- 
erations on them. Consequently, the algorithm requires at 
least super-polynomial time to solve parity games in the 
worst case. Due to page restrictions, we will only study 
the original strategy improvement algorithm here, but we 
remark that the same result can be shown for the strategy 
improvement variant due to Schewe using the same family 
of games. 

Section [2] defines the basic notions of parity games and 
some notations that are employed throughout the paper. 
Section [3] recaps the strategy improvement algorithm by 
Voge and Jurdzinski. In Section |4] we present two graph 
structures that are tricky to be solved by strategy iteration 
algorithms. Section [5] outlines a family of games on which 
the algorithm requires an exponential number of iterations. 

2. Parity Games 

A parity game is a tuple G = (V, Vq, V\,E, f2) where 
(V, E) forms a directed graph whose node set is partitioned 
into V = Vb U Vi with V H Vi = 0, and Q : V -»■ N is the 



priority function that assigns to each node a natural number 
called the priority of the node. We assume the underlying 
graph to be total, i.e. for every v 6 V there is a w £ W s.t. 
(v,w) £ S. In the following we will restrict ourselves to 
finite parity games. 

We also use infix notation vEw instead of (v, w) £ 
E and define the set of all successors of v as vE := 
{w | vEw}. The size \G\ of a parity game G = 
(V, Vo, Vi, E, fi) is defined to be the cardinality of E, 
i.e. \G\ := \E\; since we assume parity games to be total 
w.r.t. E, this seems to be a reasonable way to measure the 
size. 

The game is played between two players called and 1: 
Starting in a node vq £ V, they construct an infinite path 
through the graph as follows. If the construction so far has 
yielded a finite sequence Vq . . . v n and v n £ Vi then player i 
selects aw £ v n E and the play continues with the sequence 

Vq... V n W. 

Every play has a unique winner given by the parity of 
the greatest priority that occurs infinitely often in a play. 
The winner of the play «o w i u 2 ■ • • is player i iff max{p | 
Vj £ N3k > j : fi(ufc) = p} =2 i (where i =k j holds 
iff \i — j\ mod k = 0). That is, player tries to make 
an even priority occur infinitely often without any greater 
odd priorities occurring infinitely often, player 1 attempts 
the converse. 

A graphical depiction of a parity game here is based on 
its directed graph where nodes owned by player are drawn 
as circles and nodes owned by player 1 are drawn as rectan- 
gles; additionally, all nodes are labelled with their respec- 
tive priority, and - if applicable - with their name. 

A strategy for player i is a function a : Vi — > V, s.t. for 
all v £ V, holds that vEa(v). A play voVi . . . conforms to 
a strategy a for player i if for all j we have: if Vj £ Vi then 
V j+ i = o-(vj). 

Intuitively, conforming to a strategy means to always 
make those choices that are prescribed by the strategy. A 
strategy a for player i is a winning strategy starting in some 
node v £ V if player i wins every play that conforms to this 
strategy and begins in v. We say that player i wins the game 
G starting in v iff the player has a winning strategy for G 
starting in v. 

With G we associate two sets Wo, W% C V with the 
following definition. Wi is the set of all nodes v s.t. player 
i wins the game G starting in v. 

It is not obvious that every node should belong to either 
of Wo or W\. However, this is indeed the case and known 
as determinacy: a player has a strategy for a game iff the 
opponent does not have a strategy for that game. 

Theorem 1. itgJEI \B Let G = (V, V , Vi, E, Q) be a parity 
game. Then W (1 W 1 = and W UWi = V. 

A strategy a for player i induces a strategy subgame 



G\ a := (V,V ,Vi,E\„,Sl) where E\ a := {{u,v) £ E \ 
u £ dom(a) &(u) — v}. Such a subgame G\ a is basi- 
cally the same game as G with the restriction that whenever 
a provides a strategy decision for a node u £ Vi all transi- 
tions from u but a(u) are no longer accessible. The set of 
strategies for player i is denoted by Si(G). 

Without loss of generality we assume £1 to be injective, 
i.e. there are no two different nodes having the same pri- 
ority. We also assume that parity games do not contain 
any self-cycles or if so that they are replaced by equivalent 
two-node cycles before passing the game to the strategy im- 
provement algorithm. 

3. Strategy Improvement 

First, we briefly recap the basic definitions of the strat- 
egy improvement algorithm. For a given parity game G = 
(V, Vo, V\, E, fl), the reward of node v is defined as fol- 
lows: re^c(v) :— fl(v) if fl(v) =2 and rewc(w) := 
— Cl(v) otherwise. The set of profitable nodes for player 

is defined to be V© := {v £ V \ rew(w) > 0} and 
Vq :— {v £ V I rew(v) < 0} likewise for player 1. 

The relevance ordering < on V is induced by f2: v < 
u : il(v) < f2(it); additionally one defines the reward 
ordering -< on V by v -< u : <^=> rewc(v) < rewc(u). 
Note that both orderings are total due to injectivity of the 
priority function. 

A loopless path in G is an injective map ir : {0, . . . , k — 
1} — > V conforming with E, i.e. %(i)ETr(i + 1) for every 

1 < k. The length of a loopless path is denoted by \tt\ := k. 
The set of loopless paths n in a game G originating from the 
node v (i.e. 7r(Q) = v) is denoted by ric^). We sometimes 
write 7T = Vo ■ ■ ■ Ufe-i to denote the loopless path it : i 1— > 
v,. 

A node v in G is called dominating cycle node iff there 
is a loopless path ir £ Hg(v) s.t. — l)Eir(0) and 

max{f2(7r(i)) | i < \ir\} = fl(v). The set of dominating 
cycles nodes is denoted by Cq. 

A key point of the strategy improvement algorithm is to 
assign to each node in the game graph a valuation. Basi- 
cally, a valuation describes a loopless path originating from 
its node to a dominating cycle node. Such a valuation con- 
sists of three parts: The dominating cycle node, the set of 
more relevant nodes (w.r.t. the cycle node) on the loopless 
path leading to the cycle and the length of the loopless path 
(which measures the amount of less relevant nodes). 

To compare the second component of a valuation - the 
set of nodes on the way to the cycle - we introduce a total 
ordering -< on 2 V : To determine which set of nodes is better 
w.r.t. -i, one investigates the node with the highest priority 
that occurs in only one of the two sets. The set owning that 
node is greater than the other if and only if that node has an 
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even priority. More formally: 

M <N : <=^> 
(MAN ^ A ma X< (MAN) effl V®) V 
(MAN ^ A max K (MAN) eMnV e ) 

where MAN denotes the symmetric difference of both 
sets. 

A loopless path tt = Vq . . . Vf. induces a node valuation 
for the node v as follows: 

"Ott ■= (vk, {vt I v k -< Vi}, k) 

A node valuation § for a node v is a triple (c, M, I) £ V x 
2 y x |V| such that there is a loopless path tt with 7r(0) = u 
and -d^ — *&, 

We extend the total ordering on sets of nodes to node 
valuations: 

(u,M,e)<(v,N,f) 
(u ~< v) V (u = v A M -< N) V 
(u = v AM = N A e < / A u e V e ) V 
(u = vAM = NAe>fAu£ V&) 

A game valuation is a map 5 : V — > x 2 y x |V| 
assigning each u e V a node valuation. A partial ordering 
on game valuations is defined as follows: 

5 < H' : (V« S V : S(«) =< S'(«)) A (S ^ S') 

Game valuations are used to measure the performance 
of a strategy of player 0: For a fixed strategy a of player 
and a node u, the associated valuation basically states which 
is the worst cycle that can be reached from v conforming 
with a as well as the worst loopless path leading to that 
cycle (also conforming with a). Intuitively, the associated 
valuation reflects the best counter-strategy player 1 could 
play. 

A strategy a of player therefore can be evaluated as 
follows: 

S CT : v h+ min^ | tt e Ii GU (v) Att(\tt\ - 1) eC G[<r } 

Lemma 2. 4761/ A valuation of a strategy can be computed 
in polynomial time. 

A valuation 3 originating from a strategy a can be used 
to create a new strategy of player 0. The strategy im- 
provement algorithm only allows to select new strategy de- 
cisions for player occurring in the improvement arena 
A G , a :=(V, V , V u E', SI) where 

vE'u : <=^ 

vEu A(o67iV(t!6V A ^*(v(v)) ^ S CT (u))) 



Thus all edges performing worse than the current strat- 
egy are removed from the game. A strategy a is improvable 
iff there is a node v E Vq, a node u 6 V with vEu and 
cr(u) 7^ U s.t. S ct ((t(w)) ^(tt). 

An improvement policy now selects a strategy for player 
in a given improvement arena w.r.t. a valuation originating 
from a strategy. More formally: An improvement policy is 
a map Tq : Sq(G) — > Sq(G) fulfilling the following two 
conditions for every strategy a: 

1. For every node v S Vq it holds that (v,Tg(<j)(v)) is 
an edge in Ag <(t . 

2. If a is improvable then 5 CT ^ Sx G ( CT ) ■ 

Jurdzihski and Voge proved in their work that every 
strategy that is improved by an improvement policy can 
only result in strategies with valuations being better (w.r.t. 
<\) than the valuation of the original strategy. 

Theorem 3. H16V Let G be a parity game, a be an im- 
provable strategy and Xq be an improvement policy. Let 
a' = Iq(ct). Then S CT <d S CT <. 

If a strategy is not improvable, the strategy iteration 
comes to an end. 

Theorem 4. M&j Let G be a parity game and a be a non- 
improvable strategy. Then the following holds: 

1. W = {v\ E a (v) = (w, -,.)Ai»£ F e } 

2. Wi = {v\ E„{v) = (w, .,.)Aid6 V e } 

3. a is a winning strategy for player on Wo 

4. t : V G Vi i— > min^ U a (v) is a winning strategy for 
player 1 on W\ where U a (v) — {u £ vE | \fw S vE : 
E a (u) r< E a (w)}. 

The strategy iteration starts with an initial strategy lq 
and runs for a given improvement policy Xq as follows. 

Algorithm 1 Strategy Iteration 

1: 0~ < Lq 

2: while a is improvable do 

3: O <r-X G (o) 

4: end while 

5: return W , W\,a,T as in Theorem[4] 



The initial strategy can be selected in several ways. We 
focus on a very easy method here, always selecting the node 
with the best reward. 

The initial strategy in this paper hence will be selected 
as follows: 

lq ■ v e Vq max{it | vEu} 
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The improvement policy we are following in this paper 
is the locally optimizing policy Xq c due to Jurdziriski and 
Voge. 

It simply selects the most profitable strategy decision 
with respect to the current valuation: 

1g c ((t) : v € V i-> maxU a (v) 

where U a (v) = {u e vE | Vu> € vE : E a (w) ^ E a (u)}. 

Lemma 5. HI 61/ The locally optimizing policy can be com- 
puted in polynomial time. 

We will also refer to the globally optimizing policy by 
Schewe |[T2l ; however, due to page restrictions, we cannot 
go into detail here. 

4. Critical Graphs 

Strategy iteration performs usually very well on ran- 
domly generated games, game families considered as diffi- 
cult, as well as families based on practical examples. How- 
ever, there are structures confusing it. We will use two of 
them to finally construct a binary counter, leading to a fam- 
ily of games being of quadratic size requiring an exponen- 
tial number of iterations. Those two structures will be re- 
ferred to as deceleration lanes and stubborn cycles. 

The deceleration lane is a family of structures that com- 
prise two nodes, s and x, having outgoing edges to the rest 
of the game, a lane of nodes, a&, . . . , ao, c having incoming 
edges from the rest of the game, an internal parallel lane 
of nodes, bk, ■ ■ ■ , bo, d and finally an internal node t. See 
Figure [T]for an example of a deceleration lane. 




Figure 1. A Deceleration Lane 

The initial setting for a deceleration lane would be a 
strategy that maps all player nodes to x. Following a 
run of the strategy improvement algorithm on the whole 



graph, consider a setting in which the valuation of x re- 
mains greater than the one of s. In each such iteration, only 
one edge of the deceleration lane is a proper improvement 
edge: At first, the edge from b to d, then the edge from b\ 
to bo etc. 

At the same time, after updating to the improvement 
edge, there is always a new node - meaning one in each 
iteration - accessible from the outside in the deceleration 
lane that has the highest valuation. In the beginning, the 
best accessible node is c, then ao and after that a\ etc. To 
summarize these two properties: 

• It takes time linear in its size to be complete: Start- 
ing with the initial strategy, each iteration updates one 
strategy decision. 

• It comprises a new best-valuated node in each itera- 
tion: Namely the node using the updated strategy edge. 

There is another important feature of deceleration lanes: 
Whenever the valuation of s gets better than the one of x, 
all nodes immediately switch to s, except for d (since in this 
case, the edge from d to t is not an improvement edge). 

Therefore, the whole strategy-structure of the decelera- 
tion lane can be reset simply be valuating s higher than x, 
even if it is only for the duration of one iteration. After 
that, the deceleration lane can restructure itself in the way 
described before. 

The awkward construction using the node t to postpone 
the update of d is not necessary to "fool" the locally improv- 
ing policy (it suffices to have an edge from d to s, omitting t 
completely); Schewe's globally improving policy, however, 
requires it. To sum this property up: 

• It resets in one step: If there is an event valuating s 
better than x, the whole sub-strategy associated with 
the deceleration lane immediately gets reset by the im- 
provement policy. 
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Figure 2. Activity of a Deceleration Lane 

Figure [2] illustrates the update activity of the decelera- 
tion lane: The first column shows the sequence of strategies 
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associated with a run of the strategy iteration on a game 
containing the deceleration lane, the second column shows 
which of the two nodes having edges leading out of the 
lane has a better valuation, the last column assigns a macro- 
scopic title to each line (we will refer to these later) and 
the other columns show the strategy decisions of the current 
strategy. Note that the external event that resets the lane 
takes two iterations in this example. 

A deceleration lane is used to absorb the update activity 
of other nodes in such a way that wise strategy updates are 
postponed. A simple scenario would be a cycle of a player 
and a player 1 node: Assume that a wise strategy for player 
is to move to the player 1 node s.t. player 1 is forced to 
leave the cycle; see Figure [3] for an example of such a situ- 
ation. 




to the Deceleration Lane 



Figure 3. Usage Example 

Player will be updating to this edge iff there is no edge 
leading out of the cycle that is better than the edge used 
before. A deceleration lane thus is a device to fulfill these 
needs with the addition to be reusable due to its ability to 
reset itself. 

Although such a simple cycle can be used to fool the lo- 
cally optimizing policy in combination with a deceleration 
lane, this is not the case with the globally optimizing policy. 
We are not able to go into detail due to page restrictions; we 
phrase the problem therefore that way: If we want to post- 
pone player updates closing a cycle, we basically need to 
make sure that this is at least impossible to be done in one 
iteration. 

This can be achieved by creating a cycle consisting of 
more than one player node s.t. there is always at least one 
edge belonging to the cycle that is no improvement edge (as 
long as closing the cycle should be postponed). A stubborn 
cycle uses three player nodes, say e, / and g and one 
player 1 node, say h, to form a cycle. 

All three player nodes provide further outgoing edges 
leading to other structures, for instance to a deceleration 
lane, in the game graph. We say a stubborn cycle is o-closed 
(w.r.t. a strategy a) iff all player edges belonging to the cy- 
cle are conforming to a. See Figure [4] for an example of a 
stubborn cycle. 

To postpone the closing of the cycle, one has to create 
a setting in which the best outgoing improvement has to 
change in each iteration, e.g. in the first iteration, the best 




Figure 4. A Stubborn Cycle 

edge goes from e to somewhere else, in the second iteration, 
the best edge goes from / to somewhere else, in the third 
iteration, the best edge goes from g to somewhere else and 
thereafter the best edge goes again from e to somewhere 
else etc. 

By providing such a setting, the stubborn cycle maintains 
the invariant that one of the three player edges building 
the cycle is a non-improvement edge. In each iteration, the 
single edge belonging to the cycle is improved to lead out of 
the cycle while one of the currently two edges leading out 
of the cycle is improved to lead into the cycle. 

A stubborn cycle can therefore be combined with decel- 
eration lane as follows: e is connected to the Oth, 3rd, 6th, 
etc. entry node of the deceleration lane, / is connected to 
the 2nd, 5fth etc. entry node and g is connected to the 1st, 
4th, etc. entry node. 

Figure [5] illustrates the update activity of a stubborn cy- 
cle: The first column shows the sequence of strategies as- 
sociated with a run of the strategy iteration on a game con- 
taining the stubborn cycle, the second column shows the 
associated valuation ranking of three nodes o e , ot and o g 
that can be reached by the respective outgoing edges of the 
player nodes belonging to the stubborn cycle (in combina- 
tion with a deceleration lane, there is usually more than one 
node of the lane that can be reached by a player node of 
the stubborn cycle). The other columns denote the status of 
the respective edge w.r.t. the current strategy: Strategy (S), 
Improvement Edge (I) and Non-Improvement Edge (N). 
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Figure 5. Activity of a Stubborn Cycle 



Again, we note that we only use stubborn cycles instead 
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of simple cycles in order to create a worst-case family that 
works both for the locally improving policy and for the 
globally improving policy. 

5. Super-Polynomial Lower Bound 

We present a family of parity games of quadratic size 
requiring exponentially many iterations to be solved by the 
strategy improvement algorithm. The games are denoted by 
G n - The set of nodes are V n := V° U V n , where V n denote 
the sets of nodes owned by player i: 

V° : = {v, d, s, t, x, w, b , b 3n , 

e , e n _i, /o, fn-i, go, ffn-ii 

Iq, •••! In-li z 0, •••! Zn-l} 

Vn '■= i c , Pi q, ao, a 3n , h , h n -i, 
ko, ■ ■ ■ , kn-i, mo, . . . , to„_i} 

Please refer to Figure [6] for the priority function and the 
edges of Q n . The game Q2 is depicted in Figure [7] 
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Figure 6. The Game Q n 



Corollary 6. The game Q n has 14 • n + 11 nodes, 3 ■ n 2 + 
28 • n + 17 edges and 16 • n + 16 as highest priority. In 
particular, \Q n \ — 0(n 2 ). 



First, we note that every approach trying to construct 
a game family of polynomial size requiring exponentially 
many iterations to be solved, needs to focus on the sec- 
ond component of game valuations: There are only lin- 
early many different values for the first and third component 
while there are exponentially many for the second. 

The games Q n therefore are designed in such a way that 
the first and the third component of all occurring game val- 
uations in a run of the algorithm on Q n remain unaltered: 
Following the initial strategy (selecting the successor with 
the greatest reward), it is not hard to see that the worst (w.r.t. 
player 0) dominating cycle node that can be reached is q. 

Additionally note that Q n is completely won by player 
1, therefore there is no dominating cycle node that could 
be reached during the iteration process that is better than 
q (since q is the node with the greatest negative reward in 
Q n ). Due to the fact that all other nodes have greater priori- 
ties than q, the third component of the valuations (basically 
measuring the number of less relevant nodes) remains use- 
less. Thus, we will only focus on the paths leading to q. 

The basic idea is to create a binary counter in Q n : To 
formalise the state of an n-bit counter, we use ri-tuples a = 
(a n -i . . . «o) G {0, 1}™ where ao denotes the lowest bit in 
a and a n _i denotes the highest bit. The lowest and highest 
values in {0, 1}™ will be abbreviated by 0„ := (0 . . . 0) and 

1„ := (1 ■••!)■ 

Generally, the games Q n consist of a long deceleration 
lane (being built from the nodes Oj, bi, c, d, t, x, and s), n 
stubborn cycles (being built from the nodes e,*, /j, gi and hi) 
connected to the deceleration lane, cycle-associated struc- 
tures (ki, Zj, mj and Zi), the cycle in which all valuation- 
occurring paths end (p and q) as well as two additional 
nodes (y and w) being associated with the deceleration lane. 

The sequence of strategies that is associated with a run 
on Q n can be separated into three phases: An initialization 
phase (of length 0(1)), a counting phase (of length 2°W) 
and a finalization phase (of length 0(n)). The following 
clarifications will be focusing on the counting phase. The 
counter starts in the counting phase with the lowest bit set 
and finishes with all bits set except the lowest bit. 

The binary counter is implemented by the sequence of 
stubborn cycles. Let it be a fixed strategy occurring dur- 
ing the counting phase. A a -closed stubborn cycle (forcing 
player 1 to move out of the cycle) should be considered as a 
set bit in the counter while a cr-open stubborn cycle should 
be considered as bit which is not set. 

Intuitively, the strategy a can be described as follows: 
All nodes belonging to open stubborn cycles follow edges 
to the deceleration lane; the deceleration lane follows edges 
to the entry point of the lowest closed stubborn cycle which 
itself follows edges to the next closed stubborn cycle (via 
the cycle-associated structures) etc.; the last closed stubborn 
cycle follows edges directly leading to the end of the game. 
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Let a £ {0, l} n be the bit-state associated with a and 
assume that a ^ 0„, l n . 

Since we want to implement a counter in the game, the 
next bit-state that a changes to during the subsequent itera- 
tions should be a + (denoting the bit-increment of a; simi- 
larly, or denotes the bit-decrement of a). 

This, basically, can be achieved as follows: All player 
nodes belonging to the stubborn cycles have edges pointing 
out of the cycles to the deceleration lane, whereas stubborn 
cycles associated with bit i only have edges up to 03^+3. 

All stubborn cycles whose bits are set are connected us- 
ing the Zi nodes. The most profitable path through the cy- 
cle area therefore starts with the lowest cycle whose bit is 
set, running through all other closed stubborn cycles, finally 
ending in p. 

The number of iterations that the counter needs in order 
to increment depends on the lowest 0-bit, denoted by fi a := 
min{j I a.j = 0} (keep in my mind that we assume a ^ 
l n ), and is around j a := 3 • fi a + 7. We will call a sequence 
of subsequently improved strategies that are associated with 
one a a round. 

The iterations of a round are mainly absorbed by the 
update activity of the deceleration lane: At the beginning 
of a round, all player nodes of the deceleration lane, 
bo, ... , &3„, point to the starting node s, after that to x and 
then subsequently to the respective lower neighbour, i.e. 60 
to d, b\ to 60 etc.; we call a node a t a-chaining node iff its 
direct successor bi already moves to its lower neighbour, i.e. 
a(bi) = bi-i if i > and a(bi) = d if i = 0. 

During a round, all closed stubborn cycles remain closed 
until the end of the round since it is more profitable to keep 
a cycle closed than to also move to the deceleration lane. 
All open stubborn cycles basically point to the highest 
chaining node by the current strategy. 

Following the update activity of the deceleration lane, all 
open stubborn cycles are absorbed by updating themselves 
to always reach the highest chaining node. 

The first open stubborn cycle, i.e. the one associated with 
i := /i a , only follows the update activity of the deceleration 
lane up to a 3i+3 . After reaching that point, the next itera- 
tions lead to a closing of the i-th stubborn cycle. Addition- 
ally, the strategy decision for ki is improved to move the 
closed stubborn cycle itself. Since all lower fj nodes have 
an edge leading to ki, the next update that happens is the 
opening of all lower stubborn cycles, simulating the correct 
increment of a. 

The cycle access node s also updates immediately after 
closing the stubborn cycle to move there, leading to a reset 
of the whole deceleration lane. The node x also updates to 
move there (over the node y or w resp.) one iteration after 
that s.t. the next round is about to begin. 

From a macroscopic point of view, a strategy a occur- 
ring during the counting phase can be roughly described 



by the macroscopic state M. of the deceleration lane and 
the bit-states of the stubborn cycles. Figure[8]illustrates the 
counting phase of Q s . 
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Figure 8. Counting Phase 



The sequence of strategies associated with a run of the 
improvement algorithm (using the locally optimizing pol- 
icy) on n can be decomposed into three phases: 

1. An initialization phase that always requires 11 strate- 
gies: cr'L^s where -2 < (3 < 8 

2. A counting phase in which flipping a bit requires 7 a +3 
strategies for a given counter state a: a^ n a ^ where 

a ^ 0„, 1„ and -2 < (3 < ^ a 

3. A finalization phase that requires 3 • n + 3 strategies: 
a (n 0) where —2 < /3 < 3 • n 

The definition of o~° n j3 y &} n>at p) and rf n ^ has been put 
into the appendix. The following lemma shows in which 
order these strategies are being chosen in a run of the algo- 
rithm on Q n . 

Lemma 7. Let n > 0. We apply the following notation for 
every 0-strategy a compliant with Q n : a' := 2g° c (er). Then 
the following holds: 

1- (0' = <,-2) 

2 - = ^n <0+ i) ^ every -2 < (3 < 8 

1 K^)'=<0+.-2) 
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4 - ( £r (n,a,/j)) / = a (n, a ,0+i) f° r ever y a ^ n „,l n and 

-2 < < la 

5 - Kn.a.7,,))' = i»V2)^ W ^ Q ^ 0„,1-,1„ 

7 - ( a (n,0))' = °fn,fi+i) f° r ever y -2 < /? < 3 • n 
8. (a? , 0' = ^ 

We omit the easy but tedious proofs due to page restric- 
tions here. Technically, one simply needs to compute the 
result of the improvement policy in each step. We show one 
tiny part of the proof in the appendix. 

By induction on n using the former lemma we can con- 
clude: 

Theorem 8. Let n > 0. The strategy improvement algo- 
rithm requires 13 • 2™ — 9 iterations on Q n using the locally 
optimizing improvement policy. 

Hence, the strategy iteration using the locally optimizing 
policy requires at least super-polynomial time in the worst 
case (since each iteration requires polynomial time and Q n 
is of quadratic size; see Lemma [2] [5] and Corollary [6|. 

We implemented an open-source parity game solver plat- 
form, the PGSOLVER Collection 0, that particularly con- 
tains implementations of the strategy iteration due to Voge 
and Jurdzinski [16] as well as the variant by Schewe lfl2l . 
Benchmarking both algorithms with Q n results in exponen- 
tial run-time behaviour as can be seen in Figure [9] (note that 
the time-axis has logarithmic scale). 

6. Conclusion 

Voge mentions in his PhD thesis |[T5l that it is probably 
much more convenient for strategy improvement algorithms 
to be performed on games with an outgoing edge degree 
limited by two. A simple transformation of the family of 
games presented here resulting in a family of games with 
out-degree limited by two also requires super-polynomial 
time. 

There are other possibilities to select the initial strategy. 
Randomizing the initial strategy, for instance, is another 
popular choice: we note without proof that starting with a 
randomized strategy, the expected number of iterations on 
Q n is also super-polynomial. 

By applying some simple transformations on Q n (such 
as alternating transformation etc.), these games can also be 
used to show that both variants ( lfT2l and H0|) of Schewe's 
globally optimizing technique require the same number of 
iterations to be solved, and therefore also super-polynomial 
time. 
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Although there are many preprocessing techniques that 
could be used to simplify the family of games presented 
here - e.g. decomposition into strongly connected compo- 
nents, compression of priorities, direct-solving of simple 
cycles, etc. - such procedures cannot be implemented to fix 
the bad performance of the strategy iteration on these games 
since all known preprocessing techniques can be fooled 
quite easily without really touching the inner structure of 
the game. 

The same applies to simultaneous solving using different 
algorithms due to the fact that it is not very complicated to 
combine different worst-case games in such a way that each 
algorithm that tries to solve the whole game is slowed down 
by the part that belongs to its worst-case example. 

Parity games are widely believed to be solvable in poly- 
nomial time, yet there is no algorithm known that is per- 
forming better than super-polynomially. Voge presented 
his strategy iteration technique in his PhD thesis eight years 
ago, and this class of solving procedures is generally sup- 
posed to be the best candidate to give rise to an algorithm 
that solves parity games in polynomial time since then. 
Unfortunately the two most obvious improvement policies, 
namely the locally and the globally optimizing technique, 
are not capable of doing so. 

We think that the strategy iteration still is a promising 
candidate for a polynomial time algorithm, although it is 
possibly necessary to alter more of it than just the improve- 
ment policy. The main problem of the algorithm (and the 
policies) is that reoccurring substructures are not handled in 
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such a way that a combination of edges that was profitable 
before is applied again. The reason is that possibly not all 
edges belonging to that profitable combination are improve- 
ment edges, hence that combination cannot be selected in a 
single improvement step. 

Therefore we believe that it would be an interesting ap- 
proach to add some kind of memorization of profitable sub- 
structures that can be applied as a whole under certain con- 
ditions that are weaker than requiring all edges of the sub- 
structure to be improvement edges but strong enough to en- 
sure the soundness of the algorithm. 

Acknowledgements. I am indebted to Martin Lange and 
Martin Hofmann for their guidance and numerous inspiring 
discussions on the subject. 
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b 



°j>0 



A. Appendix 

We will apply the following additional notations for a e 

{0,1}": 

• v a := max{j < n | Vfc < j.afc = 0} 

• a\j := (a n _i . . . aj+iO ... 0) 

The initialization strategy family cr^ ^ where —2 < 
/3 < 8 is defined as follows: er° n ^ := 

s if ^ = —1,8 
a; if = -2 
d otherwise 
s if ^ = —1,8 
a; if/3^-lA/3<j 
bj-i otherwise 
if j = 0A/3> 7 
otherwise 
fc if/3>7 
| p otherwise 
« if = —1,8 
otherwise 

y if /? =-2 

w otherwise 

'/o if/? = -2,8 
w otherwise 

a 3 j +2 if /? = -2 
s if /? = -IV 

(/3 = 8Aj>0) 



f 



y 



9j 

d 

w 

Zi 



c 

a 5 

l/j 

ko 

a 3 j+i 
ai 
04 

a 3j+3 

a 

a 3 

a 6 
hi 



if/? = 0,1 

if/3 = 3,4 

if/3 = 6, 7 A j > 

otherwise 

if = 7, 8 A j > 

if /3= -2 

if/? = 2,3 

if/3 = 5, 6 A j > 

otherwise 

if/3 < 1 

if /S =1,2 

if jfl = 4,5 

if /? > 7 A j > 

otherwise 



x 
V 
V 



b 



°j>o 



d 



t 



w 



y 



9j 



The counting strategy family o4 Q m 
and — 2 < /3 < 7„ is defined as follows: a^ n a ^ 

if/3=-2, 7a 
if /? = -1 
otherwise 

if/3=-2, 7a 
if - 2 < < j, 7a 
otherwise 
if/3= -2 
otherwise 
, ifa j = lV 

(/3 > 7a - 1 A j = fl a ) 
i otherwise 
, a if < 7„ - 1 
la otherwise 
if/3=-2, 7a 
otherwise 
if a = 1 A < 7q 
, a if a = 1 A = -f a 
l„ a otherwise 

if «o = 1 
otherwise 

if a = 1 V = 7„ 
otherwise 

if = 7 Q A j < ii a 



where a ^ 0„,1„ 



if j < f a |j < nA 

(/? < 7a V < j) 
otherwise 
if aj ; = A0 = -1,0 



ifaj 
if 



= 0A/3 
0A/3 = 



7q 



IV 



^ < 



a/3-1 

''(la 

a 3 j+3 

a/3-1 
lit 



if /? > A /3 =3 A otj = OA 

(/3 < 7ct - 1 V Ma < j) 
otherwise 

if /? = -2 A ^ = A (aj 
(aT=0A 7a - <3j + l)) 

if /? > A =3 2A 
a i = 0A(/3<7 C[ -2V/* a < 

ifa+=0A/?>7 a -l 

otherwise 

if aj ; = A /? < 

if > A =3 1 A aj = OA 

(/? < 7q V ^ Q < j) 
otherwise 
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The finalization strategy family cr? ^ where —2 < (3 < 



3 • n is defined as follows: <r^ n ^ 



b 



b j>0 
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i — > < 




1 X 
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h-> kj 
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t-> k Q 
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!-* 2/ 


y 




e 3 


-/i 


fj 




<9j 





P 

a; 
s 



if /? = -2 
if /?= -1 
otherwise 
if /3 = -2 
if - 2 < < j 
i otherwise 
if j = n — 1 
otherwise 
if /3 = -2 
otherwise 
if/3= -2 
otherwise 



As an example on how to prove the obligations of 
Lemma [7] we show the following: 

Example 9. Let n > 0, a ^ 0„,1~,1„ and < /3 < 
7a — 2. Then: 

T-lOC/ 1 \ 1 

X S„ l°(n,a :/ 3V — CT (n : a,/3+l) 

froo/ Let n > 0, a ^ 0„, 1~, l n , < /3 < 7 Q - 2, cr := 



CT (n,a,/3+l)' cr 



= 2^ c (<t) andS:= S 



Due to the fact that the first component of 3(u) is g and 
the third component is irrelevant for all v (since there are no 
nodes being less relevant than q), we omit the first and third 
component here completely and identify with the path 
component of 

First, we observe that 



a\j-i < a\i-! V Hj_i = a|j_iA 
(ay < V (ay = ati A i < j))) 



(1) 



holds. By[T] we can therefore conclude that a'(zi) — 
cr*{zi) for all i. Moreover it is obvious that er'(y) = <r*(y), 
a'(w) = <j*(w) and thus a'(x) = a*(x). 

Also note that 



holds. Therefore, we can conclude that cr'(ij) = <7*(k) for 
alii 

Consider that 



Vj 7^ i> a : E(fey) -< E(AvJ 



(3) 



and therefore er'(s) = <7*(s). 

Now note that s (directly) and x (over y or w and („J 
both lead into k va by a. Hence 3(s) -< H(a;), thus 
cr'(f) = (J*(t) and a'(d) = a*(d). We can conclude that 
also u'(bj) = cr*(bj) holds for all j. 

Finally we need to investigate the update activity of the 
stubborn cycles. It easy to see that for all closed stubborn 
cycles, i.e. cr(e<) = fo, cr(/;) = gt and a(gi) = holds 
that they remain closed. 

For all open stubborn cycles i the following holds. As- 
sume that /3 =3 (the other two cases j3 =3 1 and /3 =3 2 
are similar). By definition of a it holds that a(ei) — a^_i, 

^(/i) = 5i and c(ffi) = hi- 

Now note that 3(ay) -< S(a,g) for all j ^ (3. Therefore 

o-'( e i) = /i = o"*(ei), (r'ifi) = gi = a*(fi) and a'(gi) = 
ap = <J*(gi)- 

□ 



E(a( Zj )) -< 3&) 



a , 



(2) 
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